{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Factor Model of Asset Return\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys\n",
    "!{sys.executable} -m pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import time\n",
    "import os\n",
    "import quiz_helper\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = (14, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### data bundle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import quiz_helper\n",
    "from zipline.data import bundles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')\n",
    "ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)\n",
    "bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)\n",
    "print('Data Registered')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build pipeline engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline import Pipeline\n",
    "from zipline.pipeline.factors import AverageDollarVolume\n",
    "from zipline.utils.calendars import get_calendar\n",
    "\n",
    "universe = AverageDollarVolume(window_length=120).top(500) \n",
    "trading_calendar = get_calendar('NYSE') \n",
    "bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)\n",
    "engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data¶\n",
    "With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')\n",
    "\n",
    "universe_tickers = engine\\\n",
    "    .run_pipeline(\n",
    "        Pipeline(screen=universe),\n",
    "        universe_end_date,\n",
    "        universe_end_date)\\\n",
    "    .index.get_level_values(1)\\\n",
    "    .values.tolist()\n",
    "    \n",
    "universe_tickers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(universe_tickers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.data.data_portal import DataPortal\n",
    "\n",
    "data_portal = DataPortal(\n",
    "    bundle_data.asset_finder,\n",
    "    trading_calendar=trading_calendar,\n",
    "    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,\n",
    "    equity_minute_reader=None,\n",
    "    equity_daily_reader=bundle_data.equity_daily_bar_reader,\n",
    "    adjustment_reader=bundle_data.adjustment_reader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get pricing data helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from quiz_helper import get_pricing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## get pricing data into a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "returns_df = \\\n",
    "    get_pricing(\n",
    "        data_portal,\n",
    "        trading_calendar,\n",
    "        universe_tickers,\n",
    "        universe_end_date - pd.DateOffset(years=5),\n",
    "        universe_end_date)\\\n",
    "    .pct_change()[1:].fillna(0) #convert prices into returns\n",
    "\n",
    "returns_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Let's look at one stock\n",
    "\n",
    " Let's look at this for just one stock.  We'll pick AAPL in this example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "aapl_col = returns_df.columns[3]\n",
    "asset_return = returns_df[aapl_col]\n",
    "\n",
    "asset_return = asset_return.rename('asset_return')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Factor returns\n",
    "Let's make up a \"factor\" by taking an average of all stocks in our list.  You can think of this as an equal weighted index of the 490 stocks, kind of like a measure of the \"market\".  We'll also make another factor by calculating the median of all the stocks.  These are mainly intended to help us generate some data to work with.  We'll go into how some common risk factors are generated later in the lessons.\n",
    "\n",
    "Also note that we're setting axis=1 so that we calculate a value for each time period (row) instead of one value for each column (assets)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_return_1 = returns_df.mean(axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_return_2 = returns_df.median(axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Factor exposures\n",
    "\n",
    "Factor exposures refer to how \"exposed\" a stock is to each factor.  We'll get into this more later.  For now, just think of this as one number for each stock, for each of the factors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\"\"\"\n",
    "You can run these in separate cells to see each step in detail\n",
    "But for now, just assume that we're calculating a number for each \n",
    "stock, for each factor, which represents how \"exposed\" each stock is\n",
    "to each factor. \n",
    "We'll discuss how factor exposure is calculated later in the lessons.\n",
    "\"\"\"\n",
    "lr = LinearRegression()\n",
    "X = np.array([factor_return_1.values,factor_return_2.values]).T\n",
    "y = np.array(asset_return.values)\n",
    "lr.fit(X,y)\n",
    "factor_exposure_1 = lr.coef_[0]\n",
    "factor_exposure_2 = lr.coef_[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 1 Contribution of Factors\n",
    "\n",
    "The sum of the products of factor exposure times factor return is the contribution of the factors.  It's also called the \"common return.\" calculate the common return of AAPL, given the two factor exposures and the two factor returns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Calculate the contribution of the two factors to the return of this example asset\n",
    "common_return = # ...\n",
    "common_return = common_return.rename('common_return')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 2 Specific Return\n",
    "The specific return is the part of the stock return that isn't explained by the factors.  So it's the actual return minus the common return.  \n",
    "Calculate the specific return of the stock."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: calculate the specific return of this asset\n",
    "specific_return = # ...\n",
    "\n",
    "specific_return = specific_return.rename('specific_return')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Visualize the common return and specific return\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "return_components = pd.concat([common_return,specific_return],axis=1)\n",
    "return_components.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "return_components.plot(title=\"asset return = common return + specific return\");\n",
    "pd.DataFrame(asset_return).plot(color='purple');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Solution\n",
    "\n",
    "[Solution notebook](factor_model_asset_return_solution.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
